Even in a simple linear regression there is a Variance-Covariance matrix for the two parameters of the model: the intercept and the slope. The model here is:
\[ y_i = \beta_0 + \beta_1 x_i + e_i \]
It is natural for these parameters to be correlated becuase it is les likey to have extreme combinations for both of them. Here is a simple non-Bayesian example:
## Simulate data
xx <- 1:20
yy <- 3 + 1.5 * xx + rnorm(length(xx))
dat <- data.frame(x = xx, y = yy)
## Visualize
ggplot(data = dat, aes(x = x, y = y)) +
geom_point()
## Fit a linear model
fit <- lm(y ~ x, data = dat)
## Beta coefficients
coef(fit)
## (Intercept) x
## 2.763020 1.526467
## Variance-Covariance for beta
vcov(fit)
## (Intercept) x
## (Intercept) 0.24087609 -0.017625080
## x -0.01762508 0.001678579
## Correlation matrix
cov2cor(vcov(fit))
## (Intercept) x
## (Intercept) 1.0000000 -0.8765231
## x -0.8765231 1.0000000
## Visualize model fits
set.seed(1234)
fit.s <- simulate_lm(fit, nsim = 5, value = "data.frame")
ggplot(data = fit.s, aes(x = x, y = y)) +
geom_line(aes(y = sim.y, group = ii), color = "purple", alpha = 0.3) +
geom_point()
## We can also visualize the negative correlation among the parameters
fit.b <- boot_lm(fit)
hist(fit.b, 1, ci = "perc")
hist(fit.b, 2, ci = "perc")
## Visualizing the correlation
plot(fit.b$t, xlab = "Intercept", ylab = "Slope")
The function simulate_lm uses internally the function MASS::mvrnorm which can simualte samples from a multivariate normal distribution. boot_lm does bootstrapping which involves simulating data similar to the observed one and re-fitting the model (by default 1000 times). This is a way of empirically deriving the distribution of the parameters.
How to fit a simple Bayesian linear regression using brms
priors <- prior(normal(0, 5), class = "Intercept") +
prior(normal(0, 4), coef = "x")
## Fit model
fbrms <- brm(y ~ x, data = dat, prior = priors, refresh = 0)
## Compiling the C++ model
## Start sampling
## Plotting posterior distributions of parameters
plot(fbrms, "b_")
## Extract posterior samples
ps <- posterior_samples(fbrms)
## Simple plot
plot(ps[,1:2], xlab = "Intercept", ylab = "slope")
## Or using pairs
pairs(fbrms, "^b_")
## We can extract equivalent quantities
vcov(fbrms)
## Intercept x
## Intercept 0.32344397 -0.023751584
## x -0.02375158 0.002240561
cov2cor(vcov(fbrms))
## Intercept x
## Intercept 1.0000000 -0.8822967
## x -0.8822967 1.0000000
data(barley, package = "nlraa")
## Fit one regression for each year, with year as random
barley$NF2 <- barley$NF^2
br.lme <- lme(yield ~ NF + NF2, random = ~ NF + NF2 | year, data = barley)
## Fixed or population-level effects
fixef(br.lme)
## (Intercept) NF NF2
## 133.422688 35.440069 -1.391495
## Variance-covariance for beta
vcov(br.lme)
## (Intercept) NF NF2
## (Intercept) 165.214652036 -3.0824699 -0.007421724
## NF -3.082469869 5.7281654 -0.265883674
## NF2 -0.007421724 -0.2658837 0.019519108
cov2cor(vcov(br.lme))
## (Intercept) NF NF2
## (Intercept) 1.000000000 -0.1001998 -0.004132859
## NF -0.100199784 1.0000000 -0.795158888
## NF2 -0.004132859 -0.7951589 1.000000000
## Bayesian way
prs <- prior(normal(100, 50), class = Intercept) +
prior(normal(50, 15), class = "b", coef = "NF") +
prior(normal(0, 5), class = "b", coef = "NF2") +
prior(student_t(2, 0, 150), class = "sd")
## Fit the multilevel model
br.brms <- brm(yield ~ NF + NF2 + (NF + NF2 | year),
data = barley, refresh = 0, iter = 10000,
cores = 4,
seed = 1234,
prior = prs, control = list(adapt_delta = 0.95,
max_treedepth = 15))
## Compiling the C++ model
## Start sampling
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
## I get some complains, but the model fit seems fine
plot(br.brms, "^b_")
pairs(br.brms, "^b_")
pp <- predict(br.brms)
pp2 <- cbind(barley, pp)
## Plot
ggplot(data = pp2, aes(NF, yield)) +
geom_point() +
facet_wrap(~year) +
geom_line(aes(y = Estimate)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5), fill = "purple", alpha = 0.3)
Covariances arise any time we have two or more random variables. For a sinlge variable we have a mean and a variance.
\[ \mu = E(X) \\ \sigma^2 = E[(X - E(X))^2] \] When we have two variables \(X_1\) and \(X_2\) then we have
\[ Cov(X_1, X_2) = \left[ \begin{array}{ll} \sigma_1^2 & \sigma_{12} \\ \sigma_{21} & \sigma_2^2 \end{array} \right] \] In the simplest of linear regression models we have
\[ Cov(Y) = Cov(e) = \sigma^2 \times I_{n \times n} \]
However, this Covariance could be modelled in much more complex ways when the data are more complex
library(nlme)
library(nlraa)
library(ggplot2)
data(ChickWeight)
dim(ChickWeight)
## [1] 578 4
fm1 <- lm(weight ~ Time, data = ChickWeight)
ggplot(data = ChickWeight, aes(Time, weight)) +
geom_point() +
ggtitle("Chick weight with time")
ggplot(data = ChickWeight, aes(Time, weight, color = Chick)) +
geom_point() +
geom_line(aes(y = fitted(fm1)), color = "black") +
theme(legend.position = "none")
## Fit a linear model where we fit
## increasing variance and CAR1 correlation structure
fit4 <- gls(weight ~ Time, data = ChickWeight,
weights = varPower(),
correlation = corCAR1(form = ~ Time | Chick))
v4 <- var_cov(fit4)
## Tip: you can visualize these matrices using
image(log(v4[,ncol(v4):1]))
## Take a closer look
v42 <- v4[1:36, 1:36]
image(log(v42[,ncol(v42):1]))
## Look at raw numbers
round(v4[1:12, 1:12])
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 89 129 178 236 302 377 462 555 658 770 890 954
## [2,] 129 194 267 353 452 565 692 832 985 1152 1333 1428
## [3,] 178 267 379 501 642 803 982 1181 1400 1637 1893 2029
## [4,] 236 353 501 684 877 1096 1341 1613 1911 2235 2585 2769
## [5,] 302 452 642 877 1160 1449 1774 2133 2527 2956 3418 3663
## [6,] 377 565 803 1096 1449 1869 2287 2750 3259 3811 4408 4723
## [7,] 462 692 982 1341 1774 2287 2888 3473 4115 4813 5566 5964
## [8,] 555 832 1181 1613 2133 2750 3473 4310 5106 5972 6907 7400
## [9,] 658 985 1400 1911 2527 3259 4115 5106 6241 7300 8443 9046
## [10,] 770 1152 1637 2235 2956 3811 4813 5972 7300 8809 10188 10916
## [11,] 890 1333 1893 2585 3418 4408 5566 6907 8443 10188 12158 13026
## [12,] 954 1428 2029 2769 3663 4723 5964 7400 9046 10916 13026 14177
## Look at ChickWeight
head(ChickWeight, 12)
## Grouped Data: weight ~ Time | Chick
## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
## 7 106 12 1 1
## 8 125 14 1 1
## 9 149 16 1 1
## 10 171 18 1 1
## 11 199 20 1 1
## 12 205 21 1 1
## Look at the correlation matrix
round(cov2cor(v4[1:12, 1:12]), 2)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90 0.88 0.87 0.86 0.85
## [2,] 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90 0.88 0.87 0.86
## [3,] 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90 0.88 0.88
## [4,] 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90 0.89
## [5,] 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90
## [6,] 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.92
## [7,] 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.93
## [8,] 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.95
## [9,] 0.88 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.96
## [10,] 0.87 0.88 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.98
## [11,] 0.86 0.87 0.88 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.99
## [12,] 0.85 0.86 0.88 0.89 0.90 0.92 0.93 0.95 0.96 0.98 0.99 1.00
Some magic happened in when we fitted the model using gls. The varPower option assumes that the variance in the diagonal can be modeled in the following way:
\[ \sigma^2(v) = |v|^{2 t} \]
fit4
## Generalized least squares fit by REML
## Model: weight ~ Time
## Data: ChickWeight
## Log-restricted-likelihood: -2037.874
##
## Coefficients:
## (Intercept) Time
## 33.874048 2.866917
##
## Correlation Structure: Continuous AR(1)
## Formula: ~Time | Chick
## Parameter estimate(s):
## Phi
## 0.9922111
## Variance function:
## Structure: Power of variance covariate
## Formula: ~fitted(.)
## Parameter estimates:
## power
## 2.481534
## Degrees of freedom: 578 total; 576 residual
## Residual standard error: 0.001508325
The correlation was modeled using corCAR1 (https://en.wikipedia.org/wiki/Autoregressive_model).
SAS-cov